Why Monte Carlo?

Elizabeth King
Kevin Middleton

Monte Carlo Methods

A general term that includes any methods that involve random sampling of any kind

Random Sampling

  • Values from a distribution (e.g. a normal distribution)
  • Values from a dataset
  • Values from a set of possible parameters
  • A random order for a set of values
rand_vals <- tibble(Random = rnorm(1000))

ggplot(rand_vals, aes(Random)) +
  geom_histogram(bins = 100)  

Random Sampling

Uncertainty is Inherent to Data

  • Our observations are finite samples from a larger population
    • More uncertainty in smaller samples (law of large numbers)
  • Our measurements are imperfect
    • More uncertainty for less accurate & precise measurements
  • Understanding the properties of random sampling is how we can estimate and account for this uncertainty

A real example

Are male and female jackal mandible lengths different?

Mandible lengths of female and male jackals from the Natural History Museum (London).

n = 20; Difference of means = -4.8

Are male and female jackal mandible lengths different?

How big of a difference can happen with random sampling in our dataset?

How big of a difference can happen with random sampling in our dataset?

How big of a difference can happen with random sampling in our dataset?

Difference in means for this sample: 2

Sample over and over

Sample over and over

Observed difference = -4.8

Proportion of randomized differences more extreme than the observed

Mean of the differences where the value is less than or equal to the observed mean difference.

Empirically determined P-value is 0.0018.

The expectation from random sampling underlies our statistical tests

t.test(Mandible ~ Sex, data = M)

    Welch Two Sample t-test

data:  Mandible by Sex
t = -3.4843, df = 14.894, p-value = 0.00336
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
 -7.738105 -1.861895
sample estimates:
mean in group F mean in group M 
          108.6           113.4 

Why would you need Monte Carlo Methods?

Your Data Set Violates the Assumption(s) of Parametric Tests

  • Very common issue
  • Many Monte Carlo methods are “distribution-free”
  • Monte Carlo almost always preferred to a “non-parametric test”
    • Sign test
    • Mann-Whitney U (Wilcoxon rank-sum)
    • Kruskal-Wallace

There is not a standard way to estimate a confidence interval

  • Our method of estimating confidence intervals assumes a particular distribution
    • e.g., what is the CI on a proportion?

You aren’t sure the analysis you are doing is OK

  • You are using an analysis or the results in a non-typical way
  • You want to know the rate of decision errors
    • false positives
    • false negatives

Your question isn’t answered by a typical statistical test

  • Groups can be different due to multiple causes only some of which are of interest
    • Standard statistical tests account only for sampling error in the null

Why would you need Monte Carlo Methods?

  • Monte Carlo Methods provide a toolkit to ask very practical questions
    • Do simulation experiments to find out what happens